6.2 String Theory

“There is geometry in the humming of the strings” - Pythagoras

Alphabets

Definition

An alphabet is a nonempty finite set, usually denoted \(\Sigma\). > >Elements of the set \(\Sigma\) are called letters or symbols or characters.

Here \(\Sigma\) is just a Greek letter being used for the name of a particular set; it has nothing to do with summation.

Example

Here are some common examples of alphabets: > >- \(\{0,1\}\) (the set of binary digits) >- \(\{\mathtt{a},\ldots,\mathtt{z}\}\) (the set of lowercase ASCII letters) >- All ASCII characters >- All Unicode characters

Strings

Definition

A string over \(\Sigma\) is a finite, but possibly empty, sequence of characters from \(\Sigma\). > If the alphabet is obvious from context, we just say string.

Notation: Dropping the Quotes

In Python, to write a string we need to use quotes, e.g., '37', or "37". This lets us tell the difference between, e.g., the string '37' and the integer 37.

In Theory of Computability, though, we are always working with strings, and using quotation marks all the time quickly becomes tedious. Therefore, when studying Computability we use the convention that we omit all the quotation marks. For example, we would write \(\mathtt{abc}\) instead of "abc". #### Is x a symbol or a string of length 1?

It’s true that if we’ve dropped the quotes, the notation x is ambiguous; it could be a character from our alphabet \(\Sigma\), or it could be a string of length 1 containing that character.

In practice, the difference rarely matters. And in those very rare cases where it does matter, we can tell from context which interpretation was intended.

What about the empty string?

If we omit the quotes, then the empty string "" (a sequence of zero characters from our alphabet) turns invisible, and we can’t see whether there’s a string there at all! This is awkward, so for the empty string we will use different notation.

We could continue to write "" as is done in many programming languages, but the tradition in computability is to use a greek letter. In this class, we will use the symbol \(\epsilon\) to stand for the empty string.

(Some books use \(\lambda\) instead of \(\epsilon\) but there are better uses for \(\lambda\) in theoretical computer science, as you will see in the Programming Languages course.)

Note that \(\epsilon\) is notation for a string (a sequence of characters), rather than a character. Consequently, we can be sure that \(\epsilon\not\in\Sigma\). #### But what if my alphabet is \(\{\alpha, \beta, \gamma, \delta, \epsilon, \ldots \omega\}\) ?

In the rare case where you want to use the Greek letter epsilon as a character in your strings, you just need to choose a different symbol (such as "") to represent the empty string. ### Operations on Strings

Definition

If \(w\) and \(v\) represent strings, then > >- \(wv\) is the result of concatenating \(w\) and \(v\) together. >- \(|w|\) is the length of \(w\), the number of characters. >- \(w^R\) is the reverse of \(w\), the same characters in the opposite order.

Example

Various properties of this operation immediately follow, e.g., > >- \(w\epsilon = w = \epsilon w\) >- \(|w| = |w^R|\) >- \(|wv| = |w| + |v|\) >- \((wv)^R= v^R w^R\)

Formal Languages

Definition

A formal language (over \(\Sigma\)) is a set of strings (over \(\Sigma\)) > Formal languages can be empty, finite or infinite, but (by definition of “string”) they only contain finite strings.

Example

Since all the languages we’ll discuss in Computability Theory are formal (i.e., mathematically precise), we will often just say language to mean a set of strings.

Operations on Languages

Because languages are sets, we can union two languages (\(L\cup M\)) intersect two languages (\(L\cap M\)), or take the set-difference (\(L\setminus M\)). But since they are more specifically sets of strings, we can define additional operations.

Definition

If \(L\) and \(M\) are languages, then > >- \(LM := \{\ wv\ |\ w\in L,\ v\in M\ \ \}\) >- \(L^1 := L\)
>\(L^2 := LL = \{\ wv\ |\ w\in L, v\in L\ \}\)
>\(L^3 := LLL = \{\ wvu\ |\ w\in L, v\in L, u\in L\ \}\)
>In general, \(L^{n+1} := L\,L^n\) where we define \(L^0 := \{\epsilon\}\). >- \(L^* := L^0 \cup L^1 \cup L^2 \cup L^3 \cup \cdots\)
> i.e., \(L^*\) is all the strings we can construct by picking strings from \(L\) zero or more times (but finitely many) and concatenating them. >- \(L^+ := L^1 \cup L^2 \cup L^3 \cup \cdots\)
> i.e., \(L^+\) is all the strings we can construct by picking strings from \(L\) one or more times (but finitely many) and concatenating them.

Example

If \(L = \{\mathtt{ba}, \mathtt{da}\}\) and \(M = \{\mathtt{da}, \mathtt{rk}, \epsilon\}\), then > >- \(L\cup M = \{\mathtt{ba}, \mathtt{da}, \mathtt{rk}, \epsilon\}\) >- \(L\cap M = \{\mathtt{da}\}\) >- \(L\setminus M = \{\mathtt{ba}\}\) >- \(LM = \{\mathtt{bada}, \mathtt{bark}, \mathtt{ba}, \mathtt{dada}, \mathtt{dark}, \mathtt{da}\}\) >- \(L^3 = \{\mathtt{bababa}, \mathtt{babada}, \mathtt{badaba}, \ldots, \mathtt{dadaba},\mathtt{dadada}\}\) >- \(L^{*} = \{\epsilon, \mathtt{ba}, \mathtt{da}, \mathtt{baba}, \ldots, \mathtt{dada}, \mathtt{bababa}, \ldots\}\) >- \(L^+ = \{\mathtt{ba}, \mathtt{da}, \mathtt{baba}, \ldots, \mathtt{dada}, \mathtt{bababa},\ldots\}\) >- \(M^{*} = \{\epsilon, \mathtt{da}, \mathtt{rk}, \mathtt{dada}, \ldots, \mathtt{rkrk}, \mathtt{dadada}, \ldots\}\) >- \(M^+ = \{\epsilon, \mathtt{da}, \mathtt{rk}, \mathtt{dada}, \ldots, \mathtt{rkrk}, \mathtt{dadada}, \ldots\}\) > Note that \(\epsilon\in L^*\) and \(\epsilon\in M^*\) by definition of the star operator. On the other hand, \(\epsilon\not\in L^+\) (because there’s no way to construct an empty string by concatenating one or more elements of \(L\)), but \(\epsilon\in M^+\) (because we can construct an empty string by concatenating one or more elements of \(M\), as long as those elements are \(\epsilon\)).

Example

Given the definitions of the operations, the following equivalences all hold. > >- \(L\emptyset = \emptyset\) > > There is no way to concatenate something from \(L\) with something from \(\emptyset\). > >- \(L\{\epsilon\} = L\) >- \(\{\epsilon\}^{*} = \{\epsilon\}\) > > Appending zero or more empty strings only yields the empty string. > >- \(\{\epsilon\}^+ = \{\epsilon\}\) > > Appending one or more empty strings only yields the empty string. > >- \(\emptyset^{*} = \{\epsilon\}\) > >Appending zero strings from \(\emptyset\) yields an empty string by definition. > >- \(\emptyset^+ = \emptyset\) > > There are no ways to concatenate one or more strings taken from the empty set. > >- \((L\cup M)N = LN \cup MN\) >- \((L^*)^* = L^*\)

Previous: 6.1 Introduction to Computability

Next: 6.3 State Machines